Prediction and Quantification of Splice Events from RNA-Seq Data
نویسندگان
چکیده
Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package [Formula: see text].
منابع مشابه
Event-level prediction and quantification of transcript isoforms from RNA-seq data
RNA-seq data are informative for the analysis of known and novel transcript isoforms. While the short length of RNA-seq reads limits the ability to predict and quantify full-length transcripts, short read data are well suited for the analysis of individual alternative transcript events (e.g. inclusion or skipping of a cassette exon). Available event-centric methods typically rely on annotated t...
متن کاملUnbiased comparison of alignment tools for splice junction detection from RNA-Seq data
RNA-Seq makes it possible not only to measure gene expression but also to identify and quantify transcript isoforms in different experimental conditions. Although a large number of tools allow to infer isoform expression levels from RNA-Seq data, the quantification of alternative splicing variants remains challenging. Aligning reads that span (possibly un-annotated) exon-exon junctions in an ef...
متن کاملMapping eQTLs with RNA-seq reveals novel susceptibility genes, non-coding RNAs and alternative-splicing events in systemic lupus erythematosus
Studies attempting to functionally interpret complex-disease susceptibility loci by GWAS and eQTL integration have predominantly employed microarrays to quantify gene-expression. RNA-Seq has the potential to discover a more comprehensive set of eQTLs and illuminate the underlying molecular consequence. We examine the functional outcome of 39 variants associated with Systemic Lupus Erythematosus...
متن کاملInference of alternative splicing from RNA-Seq data with probabilistic splice graphs
MOTIVATION Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large ...
متن کاملDetection of splice junctions from paired-end RNA-seq data by SpliceMap
Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect spl...
متن کامل